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Estimating the location and scale parameters is common in statis¬ 
tics, using, for instance, the well-known sample mean and standard 
deviation. However, inference can be contaminated by the presence 
of outliers if modeling is done with light-tailed distributions such as 
the normal distribution. In this paper, we study robustness to out¬ 
liers in location-scale parameter models using both the Bayesian and 
frequentist approaches. We find sufficient conditions (e.g., on tail be¬ 
havior of the model) to obtain whole robustness to outliers, in the 
sense that the impact of the outliers gradually decreases to nothing 
as the conflict grows infinitely. To this end, we introduce the family of 
log-Pareto-tailed symmetric distributions that belongs to the larger 
family of log-regularly varying distributions. 


1. Introduction. In Bayesian analysis, outlying observations and prior 
misspecification may contaminate the posterior inference. For instance, a 
group of observations may suggest a quite different posterior inference than 
that proposed by the prior and the rest of data. Using light-tailed distri¬ 
butions such as the normal can lead to an undesirable compromise where 
the posterior distribution concentrates on an area that is not supported by 
any sources of information. The conflict is usually resolved automatically by 
modeling with heavy-tailed distributions, in favor of the sources of informa¬ 
tion with the lightest tails. O’Hagan and Pericchi [16] refer to this situation 
as the theory of conflict resolution in Bayesian statistics, in their extensive 
review of the literature on that topic. 

Conflict resolution in Bayesian analysis was first described by De Finetti 
[7]. The theory has mostly been developed for location parameter inference; 
see, for instance, Dawid [6]; O’Hagan [13-15]; Angers [5]; Desgagne and 


Received August 2014; revised January 2015. 

AMS 2000 subject classifications. Primary 62F35; secondary 62F15. 

Key words and phrases. Built-in robustness, outliers, theory of conflict resolution, 
Bayesian inference, log-regularly varying distributions, log-Pareto-tailed symmetric dis¬ 
tributions. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 

2015, Vol. 43, No. 4, 1568-1595. This reprint differs from the original in 
pagination and typographic detail. 


1 




2 


A. DESGAGNE 


Angers [10]; Kumar and Magnus [12]; Andrade and Omey [4]; Andrade, 
Dorea and Guevara Otiniano [1]. 

The theory on pure scale parameter inference was first analyzed by An¬ 
drade and O’Hagan [2], who considered partial robustness using regularly 
varying distributions (see also Andrade and Omey [4] and Andrade, Dorea 
and Guevara Otiniano [1], who generalize their work of partial robust¬ 
ness), and then by Desgagne [8], who considered whole robustness using 
log-exponentially varying distributions. 

Note that partial robustness exists if the conflicting values have a sig¬ 
nificant but limited influence on the posterior distribution, as the conflict 
grows infinitely. In contrast, whole robustness is achieved if the influence of 
the conflicting values on the posterior distribution gradually decreases to 
nothing. To illustrate this, consider the estimation of a location parameter 
for a Laplace model (with a prior of 1). Hence, the posterior mode (or the 
maximum likelihood estimator) is the sample median. If, for instance, the 
sample is (10,20,30,40,50, x, x, x, x), and we let x —>• oo, then a wholly robust 
estimator of the location would be around 30 (the center of the nonoutly¬ 
ing observations), while the partially robust sample median estimates the 
location by 50, that is, the maximum of the nonoutliers. 

This paper goes a step beyond the literature in that it considers robust¬ 
ness for both location and scale parameters in the same model. The only 
other paper that considers Bayesian robustness in a location-scale model is 
Andrade and O’Hagan [3]. The essential difference is that partial robust¬ 
ness to a single outlier is achieved in their paper, while whole robustness to 
multiple outliers for both location and scale estimation is obtained in this 
paper. 

Another distinctive aspect of this paper is the possibility of using the 
results of robustness in both frequentist and Bayesian approaches. Although 
the model allows us to add prior information on the location and scale 
through a very general joint prior density it (/q a) [essentially, we only require 
that <T7r(/i, a) is bounded], it is also possible to choose a noninformative prior 
such that vr(/r, a) oc 1/a. The location and scale parameters can therefore be 
estimated in a robust way using either the Bayesian approach or a frequentist 
method like maximum likelihood estimation. 

This paper is organized as follows. In Section 2, we introduce the class 
of log-regularly varying functions because tail behavior plays a crucial role 
in the search of robustness. Essentially, this class includes functions with a 
right tail that exhibits a logarithmic decay, which can be considered a super 
heavy tail. As a result, we also define the family of log-regularly varying 
distributions. 

The model with its assumptions is described in Section 3.1, and the res¬ 
olution of conflicts is addressed through the main results of this paper in 
Section 3.2. Two simple conditions of robustness are given. Modeling with 
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a log-regularly varying distribution is the first. In the second condition, 
the number of nonoutlying observations must be larger than the maximum 
between the number of small and large outliers. Results of robustness are 
asymptotic, where the outlying observations tend to —oo or +oo. Note that 
the asymptotic nature is about the outliers and not the sample size, as is 
usually understood. Whole robustness is expressed through different types of 
convergence of quantities, based on the complete sample, to quantities based 
only on the nonoutlying observations, resulting in a complete rejection of 
outliers. We obtain the uniform convergence of the posterior densities, the 
convergence in L\. the convergence in distribution and the uniform conver¬ 
gence of the likelihoods. 

In Section 4, we introduce the family of log-Pareto-tailed symmetric dis¬ 
tributions that belongs to the larger family of log-regularly varying distri¬ 
butions. It consists essentially of a symmetric density, such as the standard 
normal, with extremities replaced by log-Pareto tails, that is, with loga¬ 
rithmic decay. In the presence of outlying observations, the log-Pareto tails 
ensure robust inference. Otherwise, the estimation is practically unaffected 
by the tails and is determined mostly by the chosen symmetric density. 

In Section 5, we show that even if the results are asymptotic, they are 
still useful in practice with data. We first illustrate the threshold feature 
in Section 5.1. When an observation moves away from the nonconflicting 
values, its influence on the inference first increases gradually up to a certain 
threshold. The conflict then begins, and the model resolves it by progres¬ 
sively reducing the influence of the moving observation (now an outlier) to 
nothing. This built-in feature is attractive in practice in that conflict is man¬ 
aged in a sensitive and automatic way. In Section 5.2, concurrent estimators 
are compared under different scenarios through simulations of observations 
to find how they perform in the presence—or absence—of outlying obser¬ 
vations. Nonrobust, partially and wholly robust modeling is considered. We 
conclude in Section 6, and some proofs are given in Section 7. 

2. Log-regularly varying functions. As mentioned in the Introduction, 
tail behavior is crucial for robust modeling. Hence, we introduce the class 
of log-regularly varying functions, as defined in Desgagne [8], following the 
idea of regularly varying functions developed by Karamata [11]. For each 
function in Section 2, say g, we assume that g(z) is continuous and strictly 
positive for z larger than or equal to a certain constant. 

Definition 1 (Log-regularly varying function). We say that a mea¬ 
surable function g is log-regularly varying at oo with index p G M, written 
g <E L p { oo), if 

Ve > 0, Vr > 1, there exists a constant A(e,r) > 0 such that 

z > and 1/r < v < r => \v p g(z v )/g{z) — 1| < e. 
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If p = 0, g is said to be log-slowly varying at oo. 

In other words, g G L p ( oo) if g(z u ) / g(z) converges to v~ p uniformly in 
any set v G [1/r, r] (for any r > 1) as z —>■ oo. The pointwise convergence for 
any v > 0 follows. 

Note that if we define the function h(z) = g(e z ), or equivalently g(z) = 
/i(logz), we have g G L p (oo ) if and only if h is regularly varying at oo with 
index —p, because \im z ^. 00 h(uz)/h(z) = u~ p . Therefore, we can obtain dif¬ 
ferent results directly from the theory of regularly varying functions. For 
instance, the functions log(logz) and 1 are both log-slowly varying at oo 
since logz and 1 are slowly varying. 

Proposition 1 (Equivalence). For any p G M, we have g G L p { oo) if 
and only if there exists a constant A > 1 and a function s G Lo(oo) such that 
for z> A, g can he written as 

g{z) = (logz) -p s( 2 ). 

Proof. It is well known that if a function h is regularly varying at oo 
with index —p, it can be represented as h(z) = z~ p l(z), where l is some 
slowly varying function. It is equivalent to say that g G L p ( oo), where 

g(z) = h(logz) = (logz)~ p l(logz) = (log z)~ p s(z), 
with s(z) = l(logz) G Lo(°o). □ 

The next proposition establishes the asymptotic dominance of a logarith¬ 
mic function over a log-slowly varying function. 

Proposition 2 (Dominance). If s G Lq(oo) and g G L p (oo), then for all 
5 > 0, there exists a constant A(<5) > 1 such that z > A(5) => 

(logz) -5 < s(z) < (logz) s and (logz)~ p ~~ s < g(z) < (log z)~ p+d . 

Proof. It is well known that if / is slowly varying, then for every 5 > 0, 
we have z~ s l(z ) — > 0 and z s l(z) — > oo as z —> oo. It follows that z~ s < l(z) < 
z s for z sufficiently large. If we replace z by logz and we set s(z ) = l(logz), 
then s G Lo(oo), and we obtain that (logz)" 5 s(z) —>■ 0 and (log 2 :) 5 s( 2 :) —> oo 
as log z —> oo (or equivalently 2 oo) and (logz) -5 < s(z) < (log^) 5 for z 
sufficiently large. Since we can write g(z) = (logz) _p s(z), the second part of 
the proposition follows directly. □ 

The index p can be interpreted as a measure of the tail’s thickness or as a 
tail index, which is useful for the ordering of different tails. The function with 
the smallest tail index p has the heaviest tail. More formally, we can verify 
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that if g\ G L pi ( oo) and <72 G Lp 2 (oo), then pi > P 2 => 9i(z) / g 2 (z) —> 0 as 
z —> 00 . The tail index p is also useful to determine if (1 /z)g(z) is integrable, 
where g(z) G L p ( 00 ), as described in the next proposition. 

Proposition 3 (Integrability). Ifg(z) G L p ( 00 ), then there exists a con¬ 
stant A > 0 such that (1 /z)g(z) is integrable on z> A, if and only if: 

(i) P> 1, 

(ii) p = 1. with the log-slowly varying part of g(z) having a sufficiently 
fast decay [e.g., faster than (log(logz))” /3 , with (3 > 1]. 

Proof. If we define h such that g(z) = /i(logz), and we choose A suffi¬ 
ciently large, then h is regularly varying at 00 with index —p, and we have 

POO POO POO POO 

/ (1 /z)g(z)dz= / (l/z)h(logz) dz = / h{u)du= / u~ p l{u)du , 

JA JA JlogA J\ogA 

where l is slowly varying. For any <5 > 0, if ^4. is sufficiently large, we have 
u~ s < l(u ) < u s . Therefore, the integral exists if p > 1 and does not if p < 1. 

If p = 1, we see that the decay of l determines the existence of the integral. 
If, for instance, l(u) < (log u)~P or s(z) = Z(logit) < (log(logwith f3 > 
1 and s G Lo(oo), then the integral exists. Instead, if l(u) > (logit)”' 3 or 
s(z) = Z(logit) > (log(logit))”' 3 , with j3 < 1 and s G Lo(oo), then the integral 
does not exist. □ 

In particular, if / is a continuous symmetric probability density function 
defined on M such that g(z) = zf(z) G L p ( 00 ), we know from Proposition 3 
that a tail index p > 1 is sufficient to guarantee that / is proper and that 
p > 1 is a necessary condition. This leads us to the next definition. 

Definition 2 (Log-regularly varying distribution). A random variable 
Z and its distribution are said to be log-regularly varying with index p > 1 
if their symmetric density / is such that zf(z) G L p ( 00 ). 

Using Propositions 1 and 2, this means that for all <5 > 0 and \z\ larger 
than a certain constant, the symmetric (with respect to 0) density / of a 
log-regularly varying distribution with index p can be written as f{z) = 
(l/|z|)(log |z|)” p s(| 2 :|), where s G Lq(oo) can be bounded by (log|z|)” 5 and 
(log |^|) 5 . Such a density with logarithmic decaying tails can be referred to 
as a super heavy-tailed distribution. 

In the next proposition, we see the asymptotic impact of a location-scale 
transformation on a log-regularly varying function g and the density / of a 
log-regularly varying distribution. Mostly, it is another way to express tail 
thickness. 
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Proposition 4 (Location-scale transformation). If g{z) = zf(z ) G L p {oo), 
then we have, as z —> oo, 

9 {{z- fj)/a)/g{z)^ 1 and (l/a)f((z - n)/a)/f{z) -A 1, 
uniformly on (n,cr) G [—A, A] x [1/r, t\, for any A > 0 and r > 1. 

Proof. Using g G L p ( oo) with Proposition 1, there exists a function 
s G Lo(oo) such that g(z ) = ifogz)~ p s{z), if 2 is large enough. Therefore, for 
any chosen A > 0 and r > 1, if z is sufficiently large, we have 

9 {{z-fi)/cr) = f log((z-/x)/o-) \ ~ p s((z — h)/(t) 
g{z) V lo g^ / s ( z ) 

It is purely algebraic to show that the term (log((z — g)/a ))/(log z) converges 
to 1 uniformly on any set (pi, a) G [—A, A] x [1/r, r] as z -A oo. 

Finally, we want to show that s((z — pi) / a)/s(z) converges to 1 uniformly 
on any set (pi,cr) G [—A, A] x [1/r, r] as 2 -A oo, or equivalently that s(y)/s(z) 
converges to 1 uniformly on y G [(* — A)/r, (z + A)r], We observe that for 
any chosen A > 0 and r > 1, if * is sufficiently large, we have 

z 1 / 2 < (z — A)/t < (z + A)r < z 2 . 

Therefore, it suffices to show that s(y)/s(z ) converges to 1 uniformly on 
y G \z x ! 2 ,z 2 \, or equivalently, that s(z v )/s(z) converges to 1 uniformly on 
any set v G [1/2,2], which is the case since s G Lq(oo). The second part of 
the proposition follows directly. □ 

3. Resolution of conflicts in a location scale parameter model. 

3.1. Model. 

(i) Let X \,..., X n be n random variables conditionally independent given 
pi and a with their conditional densities given by 

Xi | pt, cr S (l/a)f((xi- n)/a)\ 

T> 

(ii) the joint prior density of pi and a is given by pi, a ~ 7r(pi,<j), where 
n>2,xi,..., x n ,g G K, a > 0. 

We assume that the prior n (pi, a) is nonnegative on M, and the only other 
required assumption is that crvr(pi, a) is bounded. Note that in particular, 
if we have no prior information or if we use the model in a frequentist 
approach, then we set 7r(pi,<r) oc 1/cr, an improper joint prior density which 
can be considered as noninformative. 

We assume that / is a proper density that is continuous and strictly 
positive on M. In addition, we assume it is symmetric with respect to the 
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origin. We also assume that both tails of \z\f(z) are monotonic, which means 
that the tails of f(z) are also monotonic. Note that monotonicity of the tails 
of f(z) and \z\f(z) means that there exists a constant M > 0 such that 

(1) \y\>\z\>M implies that f(y) <f (z) and \y\f(y)<\z\f(z). 

It follows that f(z) and \z\f(z) are bounded on the real line, with a limit of 
0 in their tails as \z\ —> oo. Hence, considering also the prior, we can define 
the constant B as follows: 

(2) B = maxj sup/(z),sup \z\f(z), sup <77r(/i,<r) >. 

These conditions are referred to below as the conditions of regularity on /. 
The density / can possess other parameters than location and scale, such 
as a shape parameter, but they are assumed to be known. 

We study robustness of the estimation of y and a in the presence of 
outliers. The nature of the results is asymptotic, in the sense that some Xi 
are going to —oo or +oo. We want to find sufficient conditions to obtain 
whole robustness, that is, a complete rejection of the outliers. 

Among the n observations, denoted by x n = (xi,..., x n ), we assume that 
k > 2 of them, denoted by the vector x^, form a group of nonoutlying ob¬ 
servations. We assume that l of them are considered as left outliers (smaller 
than the nonoutliers) and r of them are considered as right outliers (larger 
than the nonoutliers), with k + l + r = n. 

For i = 1,... ,n, we define three binary functions ki,li and r t as follows. 
If Xi is a nonoutlying observation, we set ki = 1; if it is a left outlier, we 
set k = 1; and if it is a right outlier, we set r* = 1. These functions are set 
to 0 otherwise. We have ki + /?; + r ? ; = 1 for i = 1,..., n, with Y17=i = 

EILi h = 1 and £"=i b = r - 

We assume that each outlier is going to —oo or +oo at its own specific 
rate, to the extent that the ratio of two outliers is bounded. We can write 

Xi = ai + biU ;, 

for i = 1,..., n, where ai and bj are some constants such that a* G M and: 

(i) bi = 0 if ki = 1; 

(ii) bi < 0 if k = 1; 

(hi) bi > 0 if n = 1; 

and we let cj —> oo. Note that if multiple outliers share the same bi, they 
move as a block at the same rate. 

Let the joint posterior density of /i and a be denoted by 7r(/i, cr \ x n ) and 
the marginal density of X \,..., X n be denoted by m(x n ), with 

n 

ir(n,cr | x n ) = [m(x n )] _1 7r(^, u) J^[(l/<7)/((xj — y)/cr). 

1=1 
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Let the joint posterior density of p and cr considering only the nonoutlying 
observations x k be denoted by Tr(p,cr | x k ) and its corresponding marginal 
density be denoted by m(x k ), with 

n 

7r(/r,cj | x k ) = [m(x k )] _1 7r(/i,cj) JJ[(l/ff)/((xi -p)/a)] k \ 

i= 1 

The likelihood functions can be found by setting 7 r(p, a) oc 1/a and letting 
C(p, 0 | x n ) oc crir(p, a | x n ) and £(/r, cr | x k ) oc an(p, a | x k ). 

Proposition 5. Considering the Bayesian context given in Section 3.1, 
the joint posterior densities n(p, a \ x k ) and 7 r(/x, a | x n ) are proper. 

The proof of Proposition 5 is given in Section 7. 

3.2. Resolution of conflicts. The results of robustness are now given. 


Theorem 1. Consider the model and context described in Section 3.1, 
and assume that the conditions of regularity on f are satisfied. If we have: 


(i) zf(z ) G L p ( 00 ) / zf(z ) is log-regularly varying at 00 with index p> 1], 

(ii) k>max(l,r), 


then we obtain the following results: 
(a) 


(b) 


lim 

UJ —^OO 


m(x n ) 


mu \f{xi)) ii+ri 


m(x k ). 


lim 7r(/7,cr | x n ) =7r(/r,cr | x k ), 

Cd—fOO 


uniformly on (p,a) G [—A, A] x [1/r, t], for any A > 0 and t > 1. 

(c) 

POO POO 

lim / / |7r(/x, <7 | x n ) — 7r(/x, <7 | x k )| dp da = 0. 

u-+°°J 0 J-00 

(d) ds w-> 00, 


1 x> , 
H,(T\x n ->H,a\ x k , 


1 X> 1 ,,25, 

p | x n —>■ | x k and a \ x n —> a \ x k . 


and in particular 
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(e) 


lim £(/r,cr | x n ) = £(^,cr | x k ), 

UJ—>oo 


uniformly on (/j, , cr) G [—A, A] x [1/r, t\, for any A > 0 and r > 1. 


Proof of result (a) is substantial and therefore is given in Section 7. This 
is, however, the crucial part in the proof of Theorem 1. 


Proof of RESULT (b). Consider (/c,cr) such that 7r(//,cr) > 0 [the proof 
for the case (yu,cr) such that 7r(/c, cr) = 0 is trivial]. We have, as w A oo, 

I x n ) = m(x k ) vr(/r,cr)nr=i( 1 / cr )/((^-A^)/^) 
vr(/r, cr | x k ) m(x n ) tt(>, a) n?=i[(V ff )/((®* - b)/o")] fci 


ra(x k ) 

m(x n ) 


2=1 


AxonLii/fe)] 1 -^ 

m(x n ) 


n 

2=1 L 


( 1 /cr)/((^i — M)/cr) 


f(Xi 


li+Ti 


The first part of the last term does not depend on /i and cr and converges to 
1 as u —> oo, using result (a). The second part of the last term also converges 
to 1 uniformly in any set (/r,cr) G [—A, A] x [1/r, r] using Proposition 4. Fur¬ 
thermore, since / and cn:{yi,a) are bounded, 7r(/i,cr | x k ) is also bounded on 
any set (/c,cr) G [—A, A] x [1/r, r]. Then we have 


|vr(/r,£J | x n ) - 7r(yz,cr | x k )[ 

7T(/i,C7 | x n ) 


= 7r(M,cr | x k ] 


I x k ) 


-1 


as u ■ 


oo. 


□ 


Proofs of RESULTS (c) AND (d). We can use Scheffe’s theorem [17] 
directly to prove results (c) and (d). Using Proposition 5, we know that 
7r(/r, cr | x k ) and 7r(/c, cr | x n ) are proper. Using result (b), we have that a \ 
x n ) —> 7T (fi, a | x k ) pointwise as ui —> oo for any /i G M and cr > 0, as a result 
of the uniform convergence. The conditions of Scheffe’s theorem are then 
satisfied, and we obtain the convergence in L\ given in result (c) as well as 
the following result: 


lim 

UJ—foo 



cr | x n ) d[idcr 



7r(yc, cr | x k ) d/idcr, 


uniformly for all rectangles E in M x M + . □ 


Proof of result (e). It suffices to write the likelihood functions as 
£(//, cr | x n ) oc a7r(/i, cr | x n ) and £(//, cr | x k ) oc c77r(/i, cr | x k ) with 7r(/i, cr) oc 
1/cr, and result (e) follows directly from result (b). □ 
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An attractive feature of Theorem 1 is the simplicity of its only two suffi¬ 
cient conditions. Condition (i) says that modeling must be done using density 
/ of a log-regularly varying distribution with index p > 1; see Definition 2. 
Note that it involves only the tails of the function \z\f(z). Essentially, the 
decay of the tails must be logarithmic. For that purpose, in the next section 
we introduce the family of log-Pareto-tailed symmetric distributions that 
belong to the family of log-regular ly varying distributions. 

Condition (ii) requires that k > l and k > r. For instance, a group of 
k = 6 nonoutlying observations is sufficient to ensure the rejection of l = 5 
outliers at left and r = 5 at right. The nonoutlying group must be the most 
important, which is rather intuitive. The most demanding case occurs when 
all outliers are on the same side (e.g., 1 = 0). Condition (ii) can then be 
written as k > re/2, which means that the nonoutliers must represent more 
than half of the sample. A few numerical simulations tend to confirm our 
expectation that a larger difference between k and max(i, r) results in a 
faster rejection of the outliers. 

The asymptotic behavior of the marginal rre(x n ) is given in result (a). 
This fundamental result is probably of more theoretical than practical in¬ 
terest because it leads to results (b) to (e). The asymptotic behavior of the 
posterior density is given in result (b). The posterior considering the entire 
sample converges to the posterior considering only the k nonoutlying ob¬ 
servations, uniformly in any set (p,cr) £ [—A, A] x [1/r, r]. The outliers are 
then completely rejected as they are going to plus or minus infinity. We also 
obtain the pointwise convergence. 

In result (c), we obtain the convergence in L\ of the posterior densities 
considering the entire sample to the posterior considering only the nonout¬ 
lying observations. In result (d), we obtain the convergence in distribution, 
that is Pr(/i, a £ E \ x n ) converges to Pr(/r, a £ E \ xjJ asw-> oo, uniformly 
for all rectangles E in R x M + . Because the convergence is uniform, this is ac¬ 
tually a stronger result than the convergence in distribution, which requires 
only pointwise convergence. We also obtain the convergence in distribution 
of the posterior marginal distributions. Therefore, any estimation of p and 
a based on posterior quantiles or Bayesian credible intervals is robust to 
outliers. 

In result (e), the likelihood considering the entire sample converges to 
the likelihood considering only the nonoutlying observations, uniformly in 
any set (p,cr) £ [—A, A] x [1/r, r]. It follows that the maximum of C(p,a \ 
x n ) converges to the maximum of | Xk), and therefore the maximum 

likelihood estimates also converge, as u —> oo. 

4. The family of log-Pareto-tailed symmetric distributions. As stated in 
Theorem 1, modeling with a log-regular ly varying distribution is one of the 
conditions of robustness. However, such a distribution is super heavy-tailed, 
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and the usual densities defined on K are light or heavy-tailed. Therefore, we 
introduce in this section the family of log-Pareto-tailed symmetric distribu¬ 
tions that belongs to the larger family of log-regularly varying distributions. 
Given that the conditions of robustness involve only the tails of density f(z), 
the proposed solution consists in altering a symmetric density, such as the 
usual normal, uniform or Student’s t distributions, by replacing its extremi¬ 
ties with log-Pareto tails, that is, a function proportional to |z|^ 1 (log MOO 
with (3 > 1. This idea comes from the generalized exponential power (GEP) 
distribution, a family introduced by Angers [5] and revisited in more detail 
by Desgagne and Angers [9] . The GEP density is essentially a uniform den¬ 
sity in the center with a large spectrum of tail behavior, classified in types 
I to V, from light to super heavy-tailed. In particular, the GEP of type V is 
a log-regularly varying distribution because its density has log-Pareto tails. 
We propose here to generalize the GEP distribution of type V to the family 
of log-Pareto-tailed symmetric distributions by using any symmetric densi¬ 
ties in the center instead of limiting the choice to the uniform density. 


Definition 3. A random variable Z has a log-Pareto-tailed symmetric 
distribution if its density is given by 


/0 | 0,a,/3) 


= K {cj>,a,P) (g{z I 0)l[-a,a](z) +s(a I 0)pj ^ 


logo 
log M 


0 \ 
H-(q,C«)(I"|) ) ) 


where 2 G R, a > 1, j3 > 1, 1,a(-) is an indicator function, and g(■ \ <j>) is 
any density that is symmetric with respect to the origin, continuous and 
strictly positive on [—a, a], with its vector of parameters given by <f> G <&. 
The normalizing constant is given by 

K , = _ (I-1) _ 

(<£>«>£) (2G(o! | <j>) — l)(/3 — 1) + 2 g(a \ <j))a\og ct’ 

where G(a \ 4>) = g(u \ </>) du. 


In particular, if g(z \ <j>) is a normal density, we say that the random vari¬ 
able Z has a log-Pareto-tailed normal distribution. If g(z \ 4>) is a Student’s 
t density, we say that Z has a log-Pareto-tailed Student’s t distribution, and 
so on. The core of the density f(z \ (f), a, (3) is located between —a and a, and 
the tails are positioned in the area \z\ > a. Tail thickness is controlled with 
the parameter (3. This density satisfies the condition of robustness required 
in Theorem 1, since for \z\ > a, we have 

M/O I </,«,//) oc (logM) - ' 3 e °°). 
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All conditions of regularity assumed in Section 3.1 are satisfied as well. The 
density f(z \ 4>,a,f3) is continuous and strictly positive on M, proper (see 
Proposition 3) and symmetric with respect to the origin. Furthermore, both 
tails of \z\f(z | (f) : a,/3) are monotonic. 

In practice, choosing parameters q and /3 directly is not necessarily an 
intuitive task. It could be easier to choose other indirect but related quanti¬ 
ties. Here is an interesting strategy in five steps: a practitioner first chooses 
his favorite symmetric density g(z \ <fi) and its vector of parameters <fi (other 
than the location and scale parameters // and a, which will be added later), 
such as the IV(0, 1). The second step consists in setting the normalizing con¬ 
stant to 1. The desirable consequence is that the core (between —q 

and a) of the density f(z \ </>, q,/ 3) becomes exactly the density g(z | <f>), the 
familiar density of the user. The third step consists in choosing the mass of 
the core, which is defined as 


q = Pr(—q < Z < a \ <j>, a, /3). 


For instance, we could choose q = 0.95, which leaves 2.5% of the mass 
in each tail. Then, the density f(z \ 4>,ot,/3 ) would be exactly the IV(0,1) 
density for 95% of its mass located in the center. The following steps are 
done automatically. Given that has been set to 1, it follows that 

q = 2 G(a | </>) — 1. However, to ensure that a > 1 as required, we must choose 
q > 2G(1 | 4>) — 1. If the last equality is rearranged, it leads us to the fourth 
step, which consists in calculating a as follows: 



For example, a IV(0,1) with q = 0.95 generates a value of a = 1.96. Finally, 
we calculate f3 in the fifth step as follows: 


P = 1+ 2 g(a | 0)aToga 

i -q 


Note that this equation is consistent with a normalizing constant of 1, and 
it satisfies (3 > 1 since a > 1. Our example gives a value of f3 = 4.08. 

We compare in Figure 1 the standard normal density (dashed line) to a 
log-Pareto-tailed standard normal density (solid line), with q = 0.95, 
^(0,a,/3) = 1, « = 1-96 and /3 = 4.08. Both densities are identical between 
—q and q, but differ in the tails. 

Simulation of observations from a log-Pareto-tailed symmetric distribu¬ 
tion is easy using the inverse transformation method. It is described in detail 
in Section 3.4 of Desgagne and Angers [9] for the log-Pareto-tailed uniform 
distribution (labeled GEP density of type V in their paper). It is straight¬ 
forward to generalize it to other symmetric densities g(■ \ (j>). 
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Density: N(0,1) vs log-Pareto-tailed N(0,1) Right tail: N(0,1) vs log-Pareto-tailed N(0,1) 




Fig. 1. A comparison between the standard normal (dashed line) and log-Pareto-tailed 
standard normal (solid line) densities. 

Of course, we can add location and scale parameters, denoted, respec¬ 
tively, by p € M and a > 0, to the density f(z \ 0, a, f3 ). We obtain 

p)/o | 

' K (ch^,(i )( 1 / cr )s , ((£ - lA/ a 10)) if p-aa<z<p + aa, 

. if \z — n\ > aa. 

Note that when this density is used in the context of robustness described 
in Section 3.2, the parameters 0, a and j3 are assumed to be known. The 
inference is done on the location and scale parameters only. 

5. Example. In this section, the asymptotic results of robustness found 
in Theorem 1 are confronted with data. Without loss of generality, we choose 
the improper and noninformative joint prior density 7r(/i,<j) oc 1/a. Hence, 
both the Bayesian and frequentist approaches can be used. 

We first illustrate in Section 5.1 the behavior of different estimators of 
the location and scale parameters when one observation moves from 0 to 
100, given that the rest of data lie between —10 and 10. For the estimator 
based on robust modeling provided by Theorem 1, we observe an interesting 
feature that we call the threshold. The influence of the moving observation 
on the inference increases until a certain threshold. Then the nature of this 
observation gradually changes to become more and more outlying, as its 
influence decreases and eventually completely disappears. In Section 5.2, the 
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performances of concurrent estimators are compared for different scenarios. 
We consider simulation of observations from the normal as well as from 
contaminated normal distributions, to see how the estimators perform in 
the presence—or absence—of outliers. The mean square error is calculated 
as the measure of performance. 

5.1. Illustration of the threshold. We consider a sample of size n = 22 
given by x n = (xk,w), where the k = 21 nonoutlying observations are rep¬ 
resented by Xk = (—10, —9,..., —1,0,1,..., 9,10). We study the impact of 
moving the observation uj from 0 to 100 on the location-scale parameter 
inference based on the maximum likelihood estimator (MLE) calculated for 
three different densities /, in accordance with the model described in Sec¬ 
tion 3.1. Note that results using the Bayesian marginal posterior median are 
very similar. Naturally, the standard normal density has been chosen as the 
nonrobust model. The corresponding MLE are then the usual sample mean 
and (biased) sample standard deviation. 

The log-Pareto-tailed standard normal density, as illustrated in Figure 1, 
is also studied. We have chosen q = 0.95, a = 1.96 and (5 = 4.08, as discussed 
in Section 4. This modeling leads to complete rejection of the outlier, as 
described by Theorem 1. We also examined other values of q (the values 
of a and /3 are calculated automatically using the proposed algorithm in 
Section 4). If we choose a larger value of q, then the density is closer to 
the IV(0,1), and the same goes for the inference in the absence of outliers. 
However, the threshold of robustness increases. The choice of 0.95 appeared 
to be well balanced for good inference with and without outliers. 

The third density / considered is the Student’s t, a common choice for 
robust modeling. This density satisfies the conditions of robustness given 
in Andrade and O’Hagan [3] (which lead to partial robustness concerning 
the scale parameter), but not the conditions of whole robustness given in 
Theorem 1. The degrees of freedom has been set to 10, again to search for 
balance between good inference with and without outliers. An implicit scale 
parameter of 0.964 (other than a) has been added to match its interquartile 
range to that of the two densities considered above. 

Robustness for the three models is illustrated in Figure 2. On the x-axis, 
the observation co moves from 0 to 100. The estimators jl (left graph) and 
a (right graph) lie on the y-axis. 

The influence of the outlier on a nonrobust inference is clearly visible in 
the normal model (dashed lines) by the estimators growing indefinitely as 
the outlier increases. For u = 100, we find fi = 4.55 and a = 21.65. Using the 
normal quantile of 1.96, this model thus suggests that 95% of the observa¬ 
tions should be between —37.9 and 47.0, which is barely supported by data 
located between —10 and 10, and not at all by the outlier oj = 100. 
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Different estimators of the location parameter Different estimators of the scale parameter 

as one observation to increases from 0 to 100 as one observation to increases from 0 to 100 



Fig. 2. Estimation of the location (left graph) and scale (right graph) parameters for 
the normal model (dashed lines), the log-Pareto-tailed normal model (solid lines) and the 
Student’s t model (dotted-dashed lines), using the MLE. 

Whole robustness is illustrated by the log-Pareto-tailed normal model 
(solid lines). We can see in Figure 2 that to reaches its maximum influence 
around 16, where jl and a are approximately equal to 0.8 and 7. The influ¬ 
ence of oj then begins to decrease after this threshold as ft and a eventually 
converge to their corresponding MLE considering only the nonoutlying ob¬ 
servations Xk, given by ft = 0 and a = 6.06. For ui = 100, we find ft = 0.05 
and i t = 6.28. Using the normal quantile of 1.96 (remember that this model is 
a standard normal density except for the 2.5% log-Pareto tails), this model 
thus suggests that 95% of the observations should be between —12.3 and 
12.4, which is wholly supported by data, if u = 100 is considered as an 
outlier generated from the log-Pareto tails. 

Finally, partial robustness is illustrated by the Student’s t model (dotted- 
dashed lines). For cu = 100, we find ft = 0.35 and a = 8.44. Using the ap¬ 
propriate quantile of 2.147, this model thus suggests that 95% of the obser¬ 
vations should be between —17.8 and 18.5, which is partially supported by 
data located between —10 and 10. Note that as u continues to grow beyond 
100 , our calculations show that p decreases toward 0, and a continues to 
grow toward an upper limit of 8.71. This indicates that location estimation 
using the Student’s t is wholly robust. However, scale parameter estimation 
is only partially robust, in the sense that the inference is contaminated by 
the outlier, but only to a certain extent. 

5.2. Performance and simulations. We present here a brief study of the 
performance of the three models described above (the robust log-Pareto- 
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Table 1 

Mean square error for MLE of fi under different scenarios (n = 30) 


Model 


Scenario 


100% N (0,1) 

10% N( 0,6) 

5% AT(8,1) 

Log-Pareto-tailed normal 

0.03 

0.05 

0.07 

Student’s t 

0.03 

0.06 

0.09 

Normal 

0.03 

0.15 

0.29 


tailed normal, the partially robust Student’s t and the popular but nonrobust 
normal distributions) under three scenarios of simulations. For each scenario 
and model, a sample of size n = 30 is simulated 25,000 times, and the location 
and scale parameters are estimated each time using the MLE. Note that 
again, results using the Bayesian marginal posterior median are very similar. 
The performance is then measured by the mean square error (MSE). For each 
scenario, the true values are ft = 0 and <7 = 1. The MSE for the estimation 
of ft and <7 are given in Tables 1 and 2, respectively. 

In the first scenario, the samples are simulated from a 1V(0,1). We see 
that in the absence of outliers, the three models obtain the same excellent 
performance both for the estimation of the location (MSE = 0.03) and the 
scale (MSE = 0.02). This is rather predictable, because the three densities 
are very similar, if not identical, except for the tails. The impact of the tails 
on the estimation is felt mainly in the presence of outliers. 

In the second scenario, we consider a mixture of normal distributions, 
where an observation has a 90% probability of being generated from a N(0, 1) 
and 10% from a iV(0,6). A mixture of normal distributions is also studied in 
the third scenario, where on average 95% of the observations are generated 
from a 1V(0,1) and the remaining 5% from a 7V(8,1). 

As for the estimation of ft, we can see in Table 1 that both log-Pareto- 
tailed normal and Student’s t models give very similar MSE for the two 
contaminated scenarios (0.05 to 0.09), slightly larger than those of the 100% 
1V(0,1) scenario without outliers. However, the normal model is clearly af- 


Table 2 

Mean square error for MLE of a under different scenarios (n = 30) 


Model 


Scenario 


100% N (0,1) 

10% N (0, 6) 

5% AT(8,1) 

Log-Pareto-tailed normal 

0.02 

0.11 

0.09 

Student’s t 

0.02 

0.32 

0.30 

Normal 

0.02 

1.46 

1.14 
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fected by the outliers as its MSE increases to 0.15 and 0.29, respectively, for 
the second and third scenarios. 

The picture for the estimation of a is a bit different, as can be seen in 
Table 2. For both scenarios, the performance of the three models can be 
markedly discriminated in accordance with known theory. The MSE are 
around 0.10 for the robust log-Pareto normal model, around 0.30 for the 
partially robust Student t model and above 1 for the nonrobust normal 
model. 

6. Conclusion. Complete rejection of outliers has been investigated in a 
location-scale parameter model. The analysis has been done primarily in 
a Bayesian context, but it has been extended to the frequentist approach 
with maximum likelihood estimators. Essentially, asymptotic robustness is 
guaranteed if modeling is done using a log-regularly varying distribution 
(with logarithmic tail decay) and if k > max(Z,r), that is, if the number of 
nonoutliers is larger than both the number of outliers at —oo and at +oo. 
The first condition is easy to verify because it involves only the tails of a 
density through a limit; there are no integrals, derivatives or distribution 
functions involved. The second condition is quite reasonable and intuitive. 

We obtain the uniform convergence of the posterior density given the com¬ 
plete sample to the density considering only the nonoutlying observations. 
We also obtain the convergence in Li, the convergence in distribution, as 
well as the uniform convergence of the likelihoods. Therefore any estima¬ 
tion of the location and scale parameters based on posterior quantiles or the 
maximum likelihood estimates is robust to outliers. 

Even if the results are asymptotic, they are still useful in practice with 
data, as illustrated by the threshold feature in Section 5.1. When one ob¬ 
servation moves away from the rest of data, its influence on the inference 
begins to increase gradually, because it brings additional information that 
helps us discriminate among the possible values of the parameter. However, 
there comes a point where this moving observation conflicts with the rest 
of data. When this threshold is reached, the model automatically resolves 
the conflict by progressively reducing the influence of the outlying observa¬ 
tion. As the conflict grows infinitely, the impact of the outlier completely 
disappears. This built-in feature is attractive in practice in that conflict is 
managed in a sensitive and automatic way. 

Estimating the location and scale parameters is common in statistics, 
using, for instance, the well-known sample mean and standard deviation. 
Results found in this paper can be readily used in practice to address this 
problem in a robust way, whether one prefers the Bayesian approach or 
maximum likelihood estimation. We consider a realistic sample of any size 
with multiple possible outliers in any direction. The assumption of a sym¬ 
metric density / with the same tail behavior seems reasonable for most of 
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the applications. Because we do not know beforehand which observations 
are going to be outlying, it is generally desirable to give each density and 
each tail the same weight, and to let the largest group dominate in case of 
conflict. The choice of the appropriate density is addressed in a practical 
way by introducing the family of log-Pareto-tailed symmetric distributions. 
Furthermore, the model allows us to add prior information on the location 
and scale through a very general joint prior density, which includes the pos¬ 
sibility to choose a noninformative prior. 

This paper can be generalized in different ways. For instance, we can 
consider asymmetric densities / with different tail behavior. The family 
of log-regularly varying distributions could be widened to consider, for in¬ 
stance, distributions with a right tail proportional to ( 1 /z) exp(—5(log z) 7 ), 
with 0 < 7 < 1 and <5 > 0, which is an exponential transformation of the 
function exp(— dz' 7 ). Robustness to misspecification of the prior can also be 
investigated. 

7. Proofs. The proof of Proposition 5 is given in Section 7.1, and the 
proof of result (a) of Theorem 1 is given in Section 7.2. 

7.1. Proof of Proposition 5. To prove that vr(/i,cr | x n ) is proper [the 
proof for vr(/i,<r | x^) is omitted because it is similar], it suffices to show 
that the marginal m(x n ) is finite. Without loss of generality, we assume for 
convenience that x\ < xi < ■ ■ ■ < x n . We also define the constant 5 > 0 as 
half the minimum distance between two observations, that is, 

5= min {(x i+ i - Xj)/ 2 }. 

iG{l,...,n—1} 

We first consider /i£l and 5/M < a < oo, where M is the constant of 
monotonicity given in equation (1). Then we have 

POO POO n 

/ / n(ij,,cr)T\(l/a)f((xi-/j,)/cr)dfj,dcr 

JS/M J -oo “Lj 

a 

<B n ( l/cr) n (l/a)f({xi-n)/a)dnda 

J 8/M J —oo 

POO POO 

= B n / (1 /a) n da f(p,’)dp' 

J 8/M J —oo 

= B n (M/6) n ~ 1 /(n- l)<oo. 

In step a, we bound cnr(p,a) and n — 1 densities / by B, where B is given 
in (2). In step b, we use the change of variable // = (aq — p,)/a. In step c, 
we use n > 2 as assumed in the Bayesian context given in Section 3.1. 
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We now consider (xj -1 + Xj )/2 < n < (xj + Xj + 1)/2, for j = 1,... ,n and 
0 < a < 5/M. If we define xq := — oo and x n+ \ := oo, the union of these n 
mutually disjoint intervals constitutes the real line, that is, —oo < n < oo. 
Then we have 

n 

?r(/h e) n(V-)/((*i _ v)/°) 

i=1 

n 

< (Vo -) 5 ~ 

2=1 

n 

= {l/a)Bf(( Xj - n)/a) x (l/a) JJ (l/<r)/((®i - m)/ct) 

*=1 (*#) 

< (l/cr) J B/((x j -/i)/<r) x (l/a)[(l/cr)/((5/^)] n_1 

< S(S/<I)”- 2 (l/a)/((^ - M )/a) x (1/a) 2 f (6/a) 

oc (1 /a)f((xj - /i)/<r) x (5/a 2 ) f (5/a). 

In step a, we bound air(n,a) by B. In step b, we use f((xi — fi)/a) < f(5/a) 
by the monotonicity of the tails of f(z) since | Xi — n\/a > 5/a > 5(M/5) = 
M, because if i / j, we have 

|Xi — n\ > min{(xj — Xj- 1)/2, (xj + i — Xj)/ 2} > <5. 

In step c, we bound (1 /ct)/(5/(t) by S/<5 for n — 2 terms. Finally, we have 

rS/M p(xj+x j+ 1)/2 

/ (8/<r 2 )f(ti/<r) / (1 /a)f((xj - n)/a)d/j,da 

JO J (xj-i~\-Xj)/2 

POO POO 

< f (o') da' / f(n')dn' = 1/2 <oo, 

J 0 J —oo 

where we use the changes of variable a' = 5/a and // = (x^ — \x)/a. 

7.2. Proof of result (a) o/ Theorem 1. Consider the model described in 
Section 3.1, and assume that the conditions of regularity on / are satisfied. 
We also assume that zf(z) £ L p (oo) and k > ma x(l,r), as given in Theo¬ 
rem 1. Two lemmas are first given, and the proof of result (a) follows. 

Lemma 1. VA > 0, Vr > 1, there exists a constant D(X,t) > 1 such that 
z£K and (/r, a) £ [—A,A] x [1/r,r] =>• 

l/T>(A,r) < (1 /a)f((z - n)/a)/f(z) < D(X,t). 
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Proof. Proposition 4 states that (l/a)f((z — n)/a)/f(z) converges to 1 
uniformly in any set (n,a) G E\ %T as z — > oo, where E\ T = [—A, A] x [1/r,r]. 
Hence, VA > 0 and Vr > 1, the ratio (1 /a)f((z — qi)/a)/f(z) can be bounded, 
say by 1/1.01 and 1.01, if \z\ is larger than a certain constant, say A(X,r), 
using the symmetry of /. Therefore, we choose D(X,r) > 1.01. 

If — A(X,t) < z < H(A,r), we observe that \z — qi\/a is also bounded on 
(/a, a) G E\^ t . Therefore, since / is continuous and strictly positive on M, 
it follows that VA > 0 and Vr > 1, we can find a constant H(A,r) > 1.01 
as large as we want such that the ratio (l/a)f((z — fi)/a)/f(z) is bounded 
below by l/H(A,r) and above by H(A,r), for any (fj,,cr) G E\^ T . □ 

Lemma 2. There exists a constant C > 0 such that 



z\ > 2 M 


where M is given in equation (1). 

Proof. Let the constant C = 2D(0,2)B, where B is given in equa¬ 
tion (2), and 11(0,2) comes from Lemma 1. Consider \z\ > 2 M. 

First, consider 0 < |/x| < \z\/2. We have 


HriHz-n) * mfw2) b 

/(*) “ f(z) - 


C 


< 211(0,2 )f(n) < 211(0,2 )B = C. 


In step a, we use — < f(z/2) by the nronotonicity of the tails of / since 
\z — n\> \z\l2 > (2M)/2 = M. In step b, we use (1/2) f (z/2) / f (z) < D( 0,2) 
using Lemma 1. In step c, we bound / by B. 

Second, consider \z\/2 < \n\ < oo. We have 


f(h)f(z- h) k f(z/2)f(z-n) 

m - nz) 


< 2D(0,2)f(z -fi)< 211(0,2 )B = C, 


using f(n) < f (z/2) in the first inequality by the monotonicity of the tails 
of / since |/r| > \z\/2 > (2M)/2 = M and the same arguments as above for 
the other inequalities. □ 

We first observe that 


m(x n ) 


m(x k )Yl'i l =i[f(xi)] li+ri 
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/ oo poo _ 

/ 7r(//, cj | x k ) T 

-oo J 0 


(l/g)/((x i -^)/o-) 

f{Xi) 


1 h+r; 


dad^i. 


Therefore, we show that the last integral converges to 1 as uj —> oo to prove 
result (a). If we use Lebesgue’s dominated convergence theorem to pass the 
limit uj —» oo inside the integral, we have 


lim 

UJ —^OO 




x k) n 


{l/a)f((xi- n)/a) 

f(xi ) 


k+ri 

dadfj, 


/ oo poo n 

/ lim 7r(/x, a | x k ) T" 

-oo Jo aj ^°° f = j 


(l/o-)/((xj -/i)/a) 

f(xi ) 


h+r-i 


dcr d/r 


7r(/r, a | Xk) da d[i = 1, 


' —oo J 0 


using Proposition 4 in the second equality and Proposition 5 in the last one. 
Note that pointwise convergence is sufficient, for any value of /j£K and 
a > 0, once the limit is passed inside the integral. 

However, in order to use Lebesgue’s dominated convergence theorem, 
we need to show that ir(fi,a \ Xk) nr=i[(-*-/ cr )/(( x * ~ d)/ a )/ f { x i)] li+ri is 
bounded, for any value of u > xo, by an integrable function of /r and a 
that does not depend on uj. The constant xq can be chosen as large as we 
want, and some minimum values for xq will be given throughout the proof. 

To achieve this, we divide the domain of integration into four quadrants 
delineated by the axes /i = 0 and a = 1. Note that the proofs are only given 
for the two quadrants in the region of /r > 0 because the proofs for /r < 0 are 
similar. 

We choose the constant xq larger than a certain threshold such that the 
ranking of the set {|ar|: + r t = 1} remain unchanged for all uj>x$. Given 

that each observation Xi can be written as Xi = cq + bj.u, with = 0 if 
ki = 1, bi < 0 if li = 1 and > 0 if r, = 1, the ranking is therefore primarily 
determined by the values of 6,|. Then, without loss of generality, we assume 
for convenience that 


min {| |} = 1 and uj = min {|xj|}. 

i:li+ri =1 i:k+n =1 

If k + Vi = 1, we can use Lemma 1, with Xj = a* + biU = bi(u + cn/bi) and 
|6j| > 1, to establish that the ratio f(xi)/f(ui ) is bounded, precisely by 

l/D(\ai/bi\, \bi\) < \bi\f(xi)/f{uj) < D(\ai/bi\, \bi\). 


Quadrant 1. Consider 0 < /i < oo and 1 < a < oo. We have 
'(l/a)f((xi- n)/a) 1 li+ri 


7r(/r,cr | x k )f 


2=1 L 


f{Xi) 
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-A- /((gj ~/x)/ff) 
^ ° n [f(xi)} li+ri 


T~i " 

"—II 

i=1 


D (\ai\A)f({biUJ-n)/a) 


- a n +1 


< 


[/(<*> 


[/(®i)]' i+ri 

1 )/(( & * W ~ A i )/<7)[|&i|£ > (|a*/&i|, N 

* i= 1 


lii+T-i 


(X 


[/(w)] z+r cr n+1 J 


^/((6jW-/x)/cr) 


i= 1 


[/Cax/o-)]* 


[/M] ,+r <7 n+1 


J[/((6iW - ^)/(j 


ffi+ri 


(1/cj)/(///ct) 


<7 


k-1/2 


u/a 

w/M. 


2—1 


In step a, we use Xj = a* + b{U and 

/(0; - //)/cr) = f{{biU - n)/a + ail a) < £>(|aj|, l)/((6jW - /x)/cr) 

using Lemma 1 since |aj/cr| < |aj|. We also bound <t7t(//, <r) by B. In step 
b, we use l//(x*) < |bj|.D(|aj/£>j|, |6j|)//(w). In step c, we set bi = 0 if A:* = 1 
and we use f(—fj,/a) = by symmetry of /. 

It suffices to show that 

K/o- l i+r [/(MA)]‘- 1 ' 


w/(w) 


(T 


1/2 




li.+i't 


< OO, 


1=1 


since (l/cr) fc 1//2 (l/cr)/(/x/cr) is an integrable function on Quadrant 1, 

1 


< 2 , 

k- 3/2 - ’ 


/*oo /*oo /*00 

/ (1 /af- 1 / 2 (1 /a)f(fi/a)dfida< {l/a) k ~ 1 / 2 da 

Ji Jo J i 

since k>2. To achieve this, we split the region of a into three parts between 
1 < a; 1 / 2 < w/(2M) < oo, where M is defined in equation (1). Note that since 
u > xo, this is well defined if xo > max(l, (2M) 2 ). 

Consider 0 < // < oo and w/(2M) < cr < oo. Then we have 


uf(u)_ 


i=l 


<B 


n —1 


+r 1 <r-i(2Mi i+r +i/2 (1/a;)V2 
WMJ aV2-^ ^ M > [uf(u)Y+r 
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<B 


71—1 


(2 M ) l+r+1 / 2 ^ 

{ ’ (logw)-(H-l)(i+r) 


< B n ~ 1 (2M) l+r+1 / 2 [2{p + 1)(Z + r)/e\ [p+l){l+r) < oo. 

In step a, we use /(•) < B. In step b, we use u/a < 2M and (1/cr) < (2 M)/u. 
In step c, we use uf(u) > (logcj) _p “ 1 if oj > xq > A( 1), where 4(1) comes 
from Proposition 2. In step d, it is purely algebraic to show that the maxi¬ 
mum of (log u)P/ uj 1 / 2 is (2 (3/e)P for uj > 1 and (3 > 0, where )3 = (p+ l)(l + r) 
in our equation. 

Now consider the two other parts combined (we will split them in the 
next step), that is, 0 < fx < oo and 1 < a < w/(2M). We have 

r r r i 

a 1 / 2 


[f(^/ a )] k 1 TT\t(tu M u k+ri 

- U2 -11 [/((bt07 — /i)/cr)] 


i —1 


a 

< 


co/a 

uf(uj) 


1 nu-tw-n" [/(aw - >‘)/^)] r ‘ 

Z=1 


\f{p/a)} k -r - L W 

'( u/a)f{biu/a )' 

Z»+T* 

’ fibiu/a - p/a)f(n/a)' 

n 

iji/ 2 A A 

Z=1 

w/(w) 


fihu/a) 





k-r-l(jr 


1/2 


(7 


n 

z=l 


(■uj / a) f (bjUJ / cr) 
ojf(u) 


1 k+ri 


<B 


k—r—l 


c r 


a 


1/2 


(uj/a)f(uj/cr) 

uf{u) 


- l+r 


In step a, we use f((biU — y)/cr) < f{biUi/a ) if = 1 (which means bi < 0) by 
the monotonicity of the tails of / since \biUi — p\/a > \bi\u/a > |6j|(2M) > 
2 M > M. In step b, we use fin/a) < B and we use Lemma 2 since \bi\u/a > 
\bA(2M) > 2M. In step c, we use fibula) < f(u/a) by the monotonicity of 
the tails of / since \bi\u/a > u/a >2M>M. 

Consider 0 < p < oo and w 1//2 < a < u/(2M). We have 


1 


{u/o)fju/cr) 

uf{u) 


l-\-r 


i R l + r (1 l+B. 
i o.+r ( 1 A ) 1/4 

(log w)“(^+ 1 )( Z + r ) 


< 5* +r [4(p + 1)(Z + r)/e] {p+l){l+r) < oo. 

In step a, we use (u / a) f (u / cr) < B and (1/cr) 1 / 2 < (1/w) 1 / 4 . In step b, we use 
ujf(u) > (logu;) _p_1 if oj > xq > ^4(1), where 4(1) comes from Proposition 2. 
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In step c, it is purely algebraic to show that the maximum of (log oo)^ /oo 1 ^ 
is (4 @/e)P for oo > 1 and /3 > 0, where j3 = (p + 1)(/ + r) in our equation. 
Finally consider 0 < p, < oo and 1 < cr < w 1 / 2 . Then we have 


1 

'(uj/a)f(u/a)' 

l+r a 
< 

[ oo^fioo 1 / 2 )! 

a 1 / 2 

oof(oo) 


oof(uo) 


< 2h ,+1 )^+ r ) 


< oo. 


In step a, we use 1 /a < 1, and we use (oo / a) f (oo / a) < oo 1 / 2 f (oo 1 / 2 ) by the 
monotonicity of the tails of \z\f(z) since oo/a > oo 1 / 2 > x]J 2 > M if xo > M 2 . 
In step b, we use oo 1 / 2 f (oj 1 / 2 ) / (oo f (oo)) < 2(l/2)~ p = 2 P+1 if oo > xo > -4(1,2), 
where 4.(1,2) comes from the definition of a log-regularly varying function. 


Quadrant 2. Consider — oo < // < 0 and 1 < a < oo. The proof for 
Quadrant 2 is similar to that of Quadrant 1. 


Quadrant 3. Consider —oo < n < 0 and 0 < a < 1. The proof for Quad¬ 
rant 3 is similar to that of Quadrant 4, given below. The condition k > r is 
therefore replaced by k > l. Note that k > max(Z, r) is assumed in Theorem 1. 


Quadrant 4. Consider 0 < /j, < oo and 0 < a < 1. We need to show, 
actually, that 

r0 ° rl n r (l/a)/((* i -^)/a)' ,,i+r< 


lim 


io Jo 

COO /»! 


7r(/i,cr | x k ) Y 


1=1 L 


f(xi) 


da djx 


poo pi 

/ / 7r(/r, a \ x k ) dadfi. 

Jo Jo 


For Quadrant 1, we show this result when we integrate a between 1 and 
oo. We bound the integrand of the left term, for any value of oo > xq, by an 
integrable function of fi and a that does not depend on oo, in order to use 
Lebesgue’s dominated convergence theorem to pass the limit oo —>• oo inside 
the integral. For Quadrant 4, we proceed slightly differently. We begin by 
breaking down the left term into two parts as follows: 


coo rl 


lim 


pi 

/ 7r(//, a | x k ) T 

Jo t=i 


(l/a)f((xi- c)/cr) 


l h+ri 


da dfjL 


r* oo rl 


= lim 


i o Jo 


7r(/hcr | x k ) J" 


f(x7) 

(1 /a)f{{xi- c)/°y k+n 
f(x7) 


+ lim 

U )—^OO 


poo pi 

/ / -*(10, 

Jlu/2 Jo 


i =1 L 

x 1[0,«/2 ](m) dcrd C 

(l/a)f((xi-n)/a) lli+ri 


Hx k )II 

i =1 L 


f(x7) 


da dn, 
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where the indicator function 1 a{v) is equal to 1 if /i G A, and equal to 0 oth¬ 
erwise. We then show that the first part is equal to the integral / 0 °° J Q tt (/q a \ 
x k )dcrd/q and the second part is equal to 0. 

For the first equality, we again use Lebesgue’s dominated convergence 
theorem to pass the limit ui — > oo inside the integral. We have 


lim 

U)—too 



vr(//,cr 




(1 /<T)f((xi-n)/a) 

f(xi ) 


k+n 

![o ,u>/2](v) dadfi 


poo /*1 

Jo Jo 


7r(MW I x k ) lim | 

Ld—¥ OO - 1 - 


i —1 L 




i k+n 


f{Xi) 


x 1 [0,^/2 }(n)dadn 

POO P 1 

/ / vr(/r,(j|x k ) x 1 x l [0iOo) (/i) dtrd/x 

Jo Jo 

POO p 1 

/ / 7r(/r, (T | x k ) du d/i, 

do do 


using Proposition 4 in the second equality. Note that pointwise convergence 
is sufficient, for any value of and a > 0, once the limit is passed inside 

the integral. However, in order to use Lebesgue’s dominated convergence 
theorem, we need to show that for any value of co > xo, the integrand is 
bounded by an integrable function of q and a that does not depend on ui. 

Consider 0 < /r < ui/2 (the integrand is equal to 0 if ui/2 < [i < oo) and 
0 < a < 1. We have 


vr(//, cr | X k )J 


i= 1 L 


n)/a) 


f(xi) 


k+ri 


l[0,o;/2](^) 


CL ^-r 

< 7r(/r,u | x k ) [[ 
i =1 
n 

ocvr(/r,cr | x k ) J" 


2 Zd(0,2) 


(1 /<7)f((xi-ti)/a) 


f(xi/ 2) 




i =1 L 


(l/a)f({xi- n)/a) 


f( x i/ 2) 


li+Vi 


b 

<7r(/r,cr | x k )]_ 


f(xj- n) 

L^L f(xi / 2 ) . 


k+n 


C 

< 7r(/r,cr | x k ), 


and 7r(/q a | x k ) is an integrable function. In step a, we use 1 [oW 2 ](m) = 1 
and (l/2)/(xj/2)//(xj) < D( 0,2) using Lemma 1. In step b, we use (|Xi — 
fi\/a)f((xi — fx)/a) < |Xi — fi\f(xi — fx) by the monotonicity of the tails of 
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\z\f(z), and in step c we use f(xi — fj) < f(xi/ 2) by the monotonicity of 
the tails of f(z) since | Xi — /. i\/a > \ Xi — /i\ > \x{\/2 > ui/2 >xq/2> M, if we 
choose xo > 2 M. Note that the condition fi < cu/2(< Xi/2) is used only to 
justify | Xi — fi\ > \xi\/2 when r* = 1. 

Now we show the second equality, that is, 


lim 

UJ —^OO 


f‘CQ pi n 

/ / 7r(^,(j| x k)T 

Juj/2J0 


{l/a)f((xi-n)/a) 

f{Xi) 


h+ri 


da dfj, = 0. 


We first bound above the integrand, and then we show that the integral of 
the upper bound converges to 0 as u —> oo. 

Consider uj/2 < /r < oo and 0 < a < 1. We have 


ir(n,a\ x k )J 


1=1 L 


(1 /<r)f((xi-n)/a) 
f{Xi) 


li+ri 


<[2D(0,2)} l 7r(fx,a 




\bi\D(\ai/bi\, \bi\)(\/a)f{(xi 

/M 


v)l°Y 


(XTr(/i,a)Yl\{l/a)f(( a i-n)/a)] ki - 
i= 1 

b U 

<(l/a)B[4D(0,4)(l/a)/( W /<r)]*I] 

i=l 


/M 

' {l/a)f({xj - 

/M 




d)/°) 


n 


« (i/o')[( i /o')/(w/o')] A: n 

2=1 


~ (l/(7)/(0i ~/l)/u) l ri 

/M 


< (l/cr)[(l/c r )/(^/^)] A ' 

2=1 


2=1 


In step a, we use (l/cr)/((xj — /j.)/a)/f (xi) < 2D(0,2) if U = 1, using the 
same arguments given above for the case 0 < \i < u/2. We also use 1 / f(xi) < 
\bi\D(\a,i/bi\, \bi\)/f(oj) if rj = 1. In step 6, we bound an([i,a) by B. We also 
use 


/ ({a% — n)/a) < f ((1/ 4)u:/a) < 4D(0,4)f(u/a) 

if ki = 1 using the monotonicity of the tails of f(z) in the first inequality 
since, if we define = max*. fc ._i{|aj|} with uj > xq > 4a(fc), we have |a* — 
fi\/a = (n — Oi)/(T > {oo/2 — 0(fe))/(T > (cu/2 — w/4)/ct = (l/A)uj/a > w/4 > 
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rco/4 > M if we choose xq > AM. We use Lemma 1 in the second inequality. 
In step c, we use (ui/a) f (u/a) < using the monotonicity of the tails of 

\z\f(z) since u/a >uj>xq>M if we choose xo > M. In step d, we assume 
for convenience and without loss of generality that the right outliers are 
denoted by x\ < X 2 < ■ ■ ■ < x r . 

We now split the real line (which includes the region uj/ 2 < n < oo) into 
r mutually disjoint intervals given by (xj-\ +Xj )/2 < fi < (xj +Xj + 1)/2, for 
j = 1,... ,r, where we define xq := —oo and x r+ \ := oo. We also define the 
constant 5 > 0 as 


5= min {{x i+ \- xi)/2}. 

ie{i,...,i—1} 

Consider (xj -1 + Xj)/2 < /j, < {xj + Xj. (_i)/2, for j = 1,, r and 0 < a < 1. 
Then we have 

r 

(l/a)[{l/a)f(uj/a)} k ~ r Y[{l/a)f((x i -fi)/a) 

1=1 

< (. B/6Y~ 1 (l/a)[(l/a)f(uj/a)] k ~ r {l/a)f((x j - fi)/a) 

< /SY -1 B k ~ r ~ 1 u~^ k ~ r ^ (w/o' 2 )/ (uj/ a) x (1 /a)f« Xj - »)/*). 

In step a, we use, for i / j, (l/cr)/((xj — fj)/cr) < B/\xi — fi\ < B/6, where 
we bound \z\f(z) by B, and we use \xi — n\ > 5 because if i^j, we have 

|Xi — fi\> min{(xj — Xj- 1)/2, (xj +1 — Xj)/ 2} > 8. 

In step b, we use (u/a) f (uj /a) < B for k — r — 1 terms. Finally, we have 

pi p [pC j ~ I - )/^ 

w -(fc-r) / (oj/ a 2 )f(u/cr) {\/a)f{{xj-n)/o)dndxT 

JO J(xj-i+Xj)/2 

poo poo 

<u~ {k ~ r) ( uj/a 2 )f(u/a) (1 /a)f((xj - n)/a)d/jda 

J 0 J —oo 

poo poo 

= c / f(a')dcr' / f{ii)dii <uj~^ k ~ r ^ ^ 0 as u; —>• oo. 

J 0 J—oo 

In step a, we use the changes of variable a 1 = uj/ a and \j! = (xj — fij/cr. In 
step b, we use the condition k> r. 
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